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Measurement error data or errors-in-variable data have been collected in many studies. Natural 
criterion functions are often unavailable for general functional measurement error models due 
to the lack of information on the distribution of the unobservable covariates. Typically, the 
parameter estimation is via solving estimating equations. In addition, the construction of such 
estimating equations routinely requires solving integral equations, hence the computation is often 
much more intensive compared with ordinary regression models. Because of these difficulties, 
traditional best subset variable selection procedures are not applicable, and in the measurement 
error model context, variable selection remains an unsolved issue. In this paper, we develop a 
framework for variable selection in measurement error models via penalized estimating equations. 
We first propose a class of selection procedures for general parametric measurement error models 
and for general semi-parametric measurement error models, and study the asymptotic properties 
of the proposed procedures. Then, under certain regularity conditions and with a properly chosen 
regularization parameter, we demonstrate that the proposed procedure performs as well as an 
oracle procedure. We assess the finite sample performance via Monte Carlo simulation studies 
and illustrate the proposed methodology through the empirical analysis of a familiar data set. 

Keywords: errors in variables; estimating equations; measurement error models; non-concave 
penalty function; SCAD; semi-parametric methods 

1. Introduction 

In the regression analysis, some covariates often can only be measured imprecisely or 
indirectly, thus resulting in measurement error models, also known as errors-in-variable 
models in the literature. Various statistical procedures have been developed for statisti- 
cal inference in measurement error models (Carroll, Ruppcrt, Stefanski and Crainiceanu 
(2006)). The study on linear measurement error models dates back to Bickel and Ritov 
(1987), where an efficient estimator is provided. Stefanski and Carroll (1987) constructed 
consistent estimators for generalized linear measurement error models. Recently, Tsiatis 
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and Ma (2004) extended the model framework to an arbitrary parametric regression 
setting. Liang, Hardle and Carroll (1999) proposed partially linear measurement error 
models. Ma and Carroll (2006) studied generalized partially linear measurement error 
models. Further active research development has been established recently in the non- 
parametric measurement error area; see, for example, Delaigle and Hall (2007) and De- 
laigle and Meister (2007). The goal of this paper is to develop a class of variable selection 
procedures for general measurement error models. We would emphasize here that the 
scope of the paper is not limited to generalized linear models. 

This study was motivated by examining the effects of systolic blood pressure (SEP), a 
covariate with error, and the effects of three other covariates - respectively, serum choles- 
terol, age, and smoking status - on the probability of the occurrence of heart disease. In 
our initial analysis, we include interactions between SEP and covariates and interactions 
among covariates and quadratic terms of covariates to reduce modeling bias. It is found 
in our preliminary analysis that some interactions and quadratic terms are not significant 
and should be excluded to achieve a parsimonious model. To select significant variables 
in further analysis, we realized that the traditional Akaikc information criterion (AIC) 
and Eayesian information criterion (EIC) criteria arc not well defined for the model we 
consider in Section 4.4. Recently, a class of variable selection procedures for partially 
linear measurement error models via using penalized least squares and penalized quan- 
tile regression were proposed in Liang and Li (2009). However, their procedures are not 
applicable to cases beyond partially linear models, such as partially linear logistic regres- 
sion models, and therefore the procedures in Liang and Li (2009) cannot be applied for 
the model in Section 4.4 either. In fact, variable selection for general parametric or semi- 
parametric measurement error models is challenging. One major difficulty is the lack of 
a likelihood function in these models, due to the difficulty in obtaining the distribution 
of the error-prone covariates. For example, using Y to denote the response variable, X 
to denote the unobservable covariate, and W to denote an observed surrogate of X, 
the likelihood of a single observation {w,y) is then J pYixiul^T P)Pw\xiw\x)pxix) dx. In 
order to calculate this likelihood, one will need to estimate px , yielding a deconvolution 
problem that is known to have a very slow rate (Carroll and Hall (1988), Fan (1991)) 
and is typically avoided in parametric measurement error models. Although a reasonable 
criterion function can be used in place of the likelihood, the difficulty persists in that, 
except for very special models such as in linear or partially linear cases, even a sensible 
criterion function is unavailable. In other models such as the ones that arise in survival 
analysis, the lack of a likelihood function also causes a problem. To perform variable 
selection in these models, rather complicated methods have been proposed where for 
each potential model, one needs to fit the model, derive the asymptotic properties of the 
estimator, form some artificial criterion function based on the asymptotic properties of 
the estimators, and finally add a penalty to perform the procedure. The procedure is 
complicated and unnatural. This motivates us to develop some simple variable selection 
procedures for measurement error models when a reasonable criterion function is un- 
available. Although a few variable selection procedures exist for linear or partially linear 
measurement error models (Liang and Li (2009)), to the best of our knowledge, variable 
selection for general parametric or semi-parametric measurement error models has never 
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been systematically studied in the literature. This paper intends to fill this gap by devel- 
oping a class of variable selection procedures for both parametric and semi-parametric 
measurement error models. In addition, the method proposed here is applicable to the 
more general situation where the likelihood or a natural criterion is not available, and 
the estimation is performed through solving a set of estimating equations. 

The variable selection procedure we propose is indeed a penalized estimating equation 
method that can be applied for both parametric and semi-parametric measurement error 
models. In addition, the penalized estimating equation method is applicable to any set of 
consistent estimating equations. Note that here the measurement error model we consider 
is completely general and not limited to generalized linear models. Variable selection and 
feature selection are very active research topics in the current literature. Candes and 
Tao (2007) and Fan and Lv (2008) have studied variable selection for linear models 
when the sample size is much smaller than the dimension of the regression parameter 
space. Their results are inspiring, but only valid for linear models with very strong 
assumptions on the design matrix or the distribution of covariates. Thus, in this paper 
we follow Fan and Peng (2004) and consider the statistical setting in which the number 
of regression coefficients diverges to infinity at a certain rate as the sample size tends to 
infinity. We systematically study the asymptotic properties of the proposed estimator. 
It is worth pointing out that theoretic results in this paper provide explicit results on 
the asymptotic properties when the dimension of regression coefficients increases as the 
sample size increases. This advances the results in current literature, where estimation 
and inference are studied only for fixed finite-dimensional parameters for measurement 
error models. In our asymptotic analysis, we show that with a proper choice of the 
regularization parameters and the penalty function, our estimator possesses the oracle 
property, which roughly means that the estimate is as good as when the true model 
is known (Fan and Li (2001)). We also demonstrate that the oracle property holds in a 
simpler form for the more familiar setting where the true number of regression coefficients 
is fixed. 

In addition, we address issues of practical implementation of the proposed methodol- 
ogy. It is desirable to have an automatic, data-driven method to select the regularization 
parameters. To this end, we propose generalized cross-validation (GCV)-type and BIC- 
type tuning parameter selectors for the proposed penalized estimating equation method. 
Monte Carlo simulation studies are conducted to assess finite sample performance in 
terms of model complexity and model error. From our simulation studies, both tuning 
parameter selectors result in sparse models, while the BIC-type tuning parameter selector 
outperforms the GCV-type tuning parameter selectors. 

The rest of the paper is organized as follows. In Section 2, we propose a new class 
of variable selection procedures for parametric measurement error models and study 
asymptotic properties of the proposed procedures. We develop a new variable selection 
procedure for semi-parametric measurement error models in Section 3. Implementation 
issues and numerical examples are presented in Section 4, where we describe data-driven 
automatic tuning parameter selection methods (Section 4.1), define the concept of ap- 
proximate model error to evaluate the selected model (Section 4.2), carry out a simulation 
study to assess the finite sample performance of the proposed procedures (Section 4.3), 
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and illustrate our method in an example (Section 4.4). Technical details are collected in 
the Appendix. 

2. Parametric measurement error models 

A general parametric measurement error model has two parts, written as 



Here, the main model is py\x,z(X\X, (3), which denotes the conditional probability 
density function (p.d.f.) of the response variable Y on the covariates measured with 
error X and the covariates measured without error Z. Note that here the conditional 
distribution of Y on the covariates is completely general, hence it includes many famil- 
iar regression families. For example, the linear model with normal error Y = Xp + e, 
the logistic model Pr(y = 1|A:) = cxp(/3o + Xl3i)/{1 + cxp(/3o + XPi)}, or the Poisson 
model py\x{Y\X) = exp{-(/3o + X'^ + X'^ Pif /Y\ arc all special forms of the 

model we consider. The error model is denoted pw\x,z{W\X, Z,S,)^ where W is an ob- 
servable surrogate of X . The parameter /3 is a d-dimcnsional regression coefficient, ^ is a 
finite-dimensional parameter, and our main interest is in selecting the relevant subset of 
covariates in X and Z and estimating the subsequent parameters contained in /3. Typi- 
cally, .J is a nuisance parameter and its estimation usually requires multiple observations 
or instrumental variables. As routinely done in the literature, we assume that the model 
is identifiable. Furthermore, for simplicity, we assume in the main context of this paper 
that the error model pw\x,z{W\X,Z) is completely known and hence ^ is suppressed. 
The extension to the unknown ^ case is rather straightforward and is discussed in Section 
5. The observed data is of the form {(Wj, Zi, i^), i = 1, . . . , n}. 
Denote S*g as the purported score function. That is. 



PY\x,z{y\x^z, 



/?) and pw\x,zmX,Z,0- 



(2.1) 



S%{W,Z,Y) 



dXog^ Pw\x,z{W\X, Z)pY\xAY\X, Z)p*x^^{X\Z) d/x(X) 

dp 



where p*^^^{X\Z) is a conditional p.d.f. that one posits, which can be either equal or not 
equal to the true p.d.f. px\z{X\Z). Let the function a{X,Z) satisfy 



E[E*{a{X, Z)\W, Z, Y}\X, Z] = E{S*JW, Z, Y)\X, Z}, 



where E* indicates that the expectation is calculated using the posited p*^^^(X\Z). Note 
that here and in the sequel a model p'^^^{X\Z) has to be proposed in order to actually 
construct the estimator. Define 



Sy {W, Z,Y) = S*JW,Z,Y)- E* {a(X, Z) \ W, Z, Y} . 
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To select significant variables and estimate the corresponding parameters simultaneously, 
we propose the penalized estimating equations for model (2.1) as 

n 

Y,Slff{W,,Z,,Y,,P)-npx{P)=Q. (2.2) 

i=l 

where p\{/3) = {p\{f^i), . . . ,p\{l3d)}'^ and p'xi') is the first-order derivative of a penalty 
function px{')- Solving for /? from (2.2) gives the estimate of /?. In practice, we may allow 
different coefficients to have penalty functions with different regularization parameters. 
For example, we may want to keep some variables in the model without penalizing their 
coefficients. For ease of presentation, we assume that the penalty functions and the 
regularization parameters are the same for all coefficients in this paper. 

The penalties in the classic variable selection criteria, such as AIC and BIC, cannot be 
applied to the penalized estimating equations. Following the study on the choice of the 
penalty functions in Fan and Li (2001), we use the smoothly clipped absolute deviance 
(SCAD) penalty, whose first-order derivative is defined as 

Pa (7) - a|/(|7| < A) + i^^-M±/(|^| > A)| sign(7) (2.3) 

for any scalar 7, where sign(-) is the sign function, that is, sign(7) = —1, and 1 when 
7 < 0, =0 and > 0, respectively. Here, a > 2 is a constant, and a choice of a = 3.7 is 
appropriate from a Bayesian point of view. A property of (2.2) is that with a proper 
choice of penalty functions, such as the SCAD penalty, the resulting estimate contains 
some exact zero coefficients. This is equivalent to excluding the corresponding variables 
from the final selected model, thus variable selection is achieved at the same time as 
parameter estimation. 

Concerns about model bias often prompt us to build models that contain many vari- 
ables, especially when the sample size becomes large. A reasonable way to capture such 
a tendency is to consider the situation where the dimension of the parameter /3 increases 
along with the sample size n. Wc therefore study the asymptotic properties of the pe- 
nalized estimating equation estimator under the setting in which both the dimension of 
the true non-zero components of /3 and the total length of (3 tend to infinity as n goes 
to infinity. Denote /3o = (/3io, • ■ • iPdo)"^ as the true value of (3. Let 

a„ = max{|p^J|/3,o|)|:ft-o^0} and 6„ max{KJ|/3jo|)| : /?,o ^ 0}, (2.4) 

where we write A as A„ to emphasize its dependence on the sample size n. 

Theorem 1. Suppose that condition (PI) in the Appendix holds. Under regularity con- 
ditions ( Al)-( A3) in the Appendix, and if d'^n~^ —^0, A„ — )■ when n — >■ 00, then 
with probability tending to one, there exists a root of (2.2), denoted /?, such that 
11/3 — /3o|| = Op{\/d^{n^^^^ +a„)}, where we write d as dn to emphasize its dependence 
on the sample size n. 
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The proof of Theorem 1 is given in the Appendix. Theorem 1 demonstrates that the 
convergence rate depends on the penalty function and the regularization parameter A„ 
through a„. From Theorem 1, it requires a„ ~ 0(l/-\/n) to achieve root (n/dn) conver- 
gence rate. For the Li penalty, a„ = A„. Thus, the root [n/dn) convergence rate requires 
that A„ = 0{l/^/n), while a„ = as A„ — > for the SCAD penalty. Thus, the resulting 
estimate with the SCAD penalty is root (n/c?„) consistent. 

To present the oracle property of the resulting estimate, we first introduce some no- 
tation. Without loss of generality, we assume /3o = {PJ^t Plio)^ ^ ^^"^ tTine model 
any element in /J/o is not equal to while (3 no = 0. Denote the dimension of /?/ as di. 
Furthermore, denote 

fe = {PA„(/3io),---,PA„(/5dio)}^ and E = diagK J^o), ■ • ■ ,Pa„ (/^^lo)}, (2.5) 

and the first di components of S*g{W, Z,Y,/3) as S**^ /(/3)- In the following theorem, we 
use the same formulation as that in Cai, Fan, Li and Zhou (2005). 

Theorem 2. Suppose that condition (PI ) holds. Under regularity conditions ( Al )- 
( A3 ), assume A„ — > and d^ /n — > when rt — > oo . // 



lim inf lim inf \/n/ dnp'\ {j)—^oo, (2-6) 

n— ^oo 7— )-0+ ^ 

then with probability tending to one, any root n consistent solution /3„ = [jSj , I3jj)'^ of 
(2.2) must satisfy that: 

(i) /3//=0, 

(ii) for any di x 1 vector v, s.t. v'^v = 1, 

1 



where the notation — stands for convergence in distribution. 

The proof of Theorem 2 is given in the Appendix. For some penalty functions, including 
the SCAD penalty, b and S are zero when A„ is sufficiently small, hence the results in 
Theorem 2 imply that the proposed procedure has the celebrated oracle property: that 
is, Pii = 0, and for any di x 1 vector v, s.t. v'^v = 1, 

^v'^[E{S:ffjil3jo)S:ljiM}]-'/'Ei^^^^f^^ (2.7) 

Theorems 1 and 2 imply that for fixed and finite d, ||/3 — /3o|| = Op(n^^/^ -I- a„) and 
with probability tending to one, any root n convergence solution (3 = {Pj of (2.2) 
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must satisfy that /?// = and 



■N 



d0f 



X <^ E S 



E{Stffjif3io)S:ljWio)} 



where the notation M~'^ denotes (M~^)'^ for a matrix M. These results are still valid 
under much weaker conditions. See an elaborated version of this paper, Ma and Li (2007), 
for details. 

For SCAD penalty and for fixed and finite d, (2.7) becomes 

- Pio) ^ N{o,E{ds:ffj/dpjr'E{s:ffjs:ij)E{s:ffj/df5jr^} 

in distribution. In other words, with probability tending to 1, the penalized estimator 
performs in the same manner as the locally efficient estimator under the correct model. 



3. Semi-parametric measurement error models 

To motivate the problems considered in this section, we start with some commonly used 
semi-parametric regression models for which the proposed procedure in this section can 
be directly applied. Consider first the error-free regression cases, and let Y be the response 
and Z and S be the covariates. Throughout this paper, we consider univariate Z only. 
Consider the partially linear model defined as follows: 

Y = e{Z) + S^f3 + e. (3.1) 

The partially linear model keeps the flexibility of the nonparametric model for the base- 
line function while maintaining the explanatory power of parametric models. Therefore, 
it has received a lot of attention in the literature. See, for example, Hardle, Liang and 
Gao (2000) and references therein. Various extensions of the partially linear model have 
been proposed in the literature. Li and Nie (2007, 2008) proposed the partially nonlinear 
models 

Y = e{Z) + f{S;f3)+e, (3.2) 

where f{S;f3) is a specific, known function that may be nonlinear in /3. See Li and Nie 
(2007, 2008) for some interesting examples. Li and Liang (2008) and Lam and Fan (2008) 
studied the generalized varying coefficient partially linear model 



g{EiY\Z,S)} = Sjl3 + S^eiZ), 



(3.3) 
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where g{-) is a link function and (51,52, 2') are covariates. Model (3.3) includes most 
commonly used semi-parametric models, such as the partially linear models (3.1), the 
generalized partially linear models (Severini and Staniswalis (1994)), and semi-varying 
coefficient partially linear models (Fan and Huang (2005)). 

In the presence of covariates measured with error, one may extend the aforementioned 
semi-parametric regression models for measurement error data. As in the last section, 
let X be the covariate vector measured with error. Among these semi-parametric models 
with error, the partially linear measurement error model 

Y = e(Z) + X^l3i + S^p2 + e (3.4) 

has been studied in Liang, Hardle and Carroll (1999). Liang and Li (2009) proposed 
a class of variable selection for model (3.4) using penalized least squares and penalized 
quantile regression. Our procedure in this section, however, is directly applicable for both 
the generalized varying coefficient partially linear measurement error model 

g{E{Y\X, Z, S)} = X^p, + SjP2 + 5j0(Z), (3.5) 

where S= {Sj,Sj)'^, and the partially nonlinear measurement error model 

Y = 0{Z) + f{X,S;P)+e, (3.6) 

when an error distribution is assumed. It is worth noting that model (3.6) includes the 
following model as a special case 

Y = X^pi + 8^13-2 + {XZfiSs + d{Z) + e, (3.7) 

where {XZ) consists of all interaction terms between X and Z , but model (3.4) does not. 
Thus, the variable selection procedures proposed in Liang and Li (2009) are not directly 
applicable for model (3.7). 

In summary, in this section, we consider a general semi-parametric error model that 
includes models (3.4)-(3.6) as its special cases. Specifically, the semi-parametric mea- 
surement error model we consider here also has two parts: 

PY\x,z.s{Y\X,Z,S,P,e{Z)} and pw\x.zAW\X,Z,S). (3.8) 

The major difference from its parametric counterpart is that the main model contains 
unknown functions 9{Z). It is easy to check that models (3.4)"(3.6) arc special cases of 
model (3.8). Note that a simpler version of this model is considered in Ma and Carroll 
(2006), where the dimension of (3 is assumed to be fixed and the dimension of 9 is assumed 
to be one. 

Throughout this paper, we assume that the model is identifiable. We propose the 
penalized estimating equation for the semi-parametric model: 



n 

C{W,,Z,, S,,Y,, 13, k) - npx{l3) = 0, 

i=l 



(3.9) 
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where P\{I3) has the same form as in (2.3). However, the computation of C is more in- 
volved. Denote the dimension of 0{Z) as to, a fixed and finite integer. If we replace 0{Z) 
with a single unknown TO-dimensional parameter a and append a to /3, we obtain from 
(3.8) a parametric measurement error model with parameters {0^ ^a^)"^ . For this para- 
metric model, we can compute the corresponding S*^g as done in the last section. Specifi- 
cally, we will have Slg{W, Z, S, Y) = S*p ^{W, Z, S, Y) - E*{aiX, Z, S)\W, Z, S, Y}, where 
a{X,Z,S) satisfies E[E*{aiX,Z,S)\W,Z,S,Y}\X,Z,S] = E{Sl^iW,Z,S,Y)\X,Z,S}. 
Note that S^jj has the same dimension as the dimension of /3 plus to. We write the 
last TO components of 5**^ as 5'(X, Z, 5*, F, /3, a) and the rest as £{X, Z, S,Y, f3,a). We 
now solve for 0^, i = 1, . . . , n, from 

n 

^Kh{zi - zi)'if{wi,Zi,Si,yi;P,0i) = 0, 

i=l 

; (3.10) 

n 

'^Kh{zi - Zn)^!{wi,Zi,Si,yi;l3,d„) = 0, 

i=l 

where Kh{z) = h~^K{z/h), K is a. smooth symmetric kernel function with compact 
support that satisfies / K{t)t^dt = 1, and h is a bandwidth. Note that 6i,...,6n are 
all TO-dimensional parameters. Inserting the 6i^s into C in (3.9), we obtain a complete 
description of the estimator. Note that 9i depends on /3, so a more precise notation 
for 9i is Oi{(3). Solving equation (3.9) yields a penalized estimating equation estimate. 
Theorem 3 below gives its convergence rate. 

Theorem 3. Suppose that condition (PI) holds. Under regularity conditions (BlJ-fJiA) 
in the Appendix, and if df-^^n^^ — >■ 0, A„ — > when n — >■ oo, then with probability tending to 
one, there exists a root of (3.9), denoted j3, such that ||/3 — I3q\\ = Op{\/d^(?i^"^/^ + a„)}. 

The proof of Theorem 3 is given in the Appendix. Theorem 3 indicates that to achieve 
root (n/dn) convergence rate (or root n convergence rate for finite and fixed d), X„ and 
the penalty function must be chosen such that a„ = Op{n~^^'^). 

Let Cj be the first di components of £, Cjjs^ the partial derivative of Ci with respect 
to /3/, Cje the partial derivative of Cj with respect to 9, 'ig the partial derivative of ^' 
with respect to 9, and the partial derivative of ^' with respect to /S/. Also define 
n{Z) = Ei'^elZ), Ui{Z) = E{Cie\Z)n-\Z) and 9f},{Z) = -n'^{Z)E(^!i3^\Z). Further 
defining 

A = E[Cip,{W, Z, S, Y, /3o, 9o{Z)} + Cig{W, Z, S, Y, /3o,eo{Z)}9p^ {Z, /3o)], 
B = cov[£/{VK, Z, S, Y, /3o, 9oiZ)} - Ui{Z)^>{W, Z, S, Y, /3o, 0o(^)}], 

we obtain the following results. 
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Theorem 4. Suppose that condition (PI) holds. Under regularity conditions 

(^B4J, if Xn — > 0, d^n~^ —J- 0, and (2.6) holds, then with probability tending to one, any 

root n consistent estimator /?„ = (/3j,/3j)"^ obtained in (3.9) must satisfy that 

(i) /3//=0, 

(ii) for any o?i x 1 vector v such that v'^v = 1, 

^/^v^B-^^^(A - E){/3/ - /3/0 -{A- J:)-'b} A N{0, 1). 

The proof of Theorem 4 is given in the Appendix. Theorem 4 imphes that for fixed 
and finite d, the convergence rate of the resulting estimate is rt~^/^ + On- It also implies 
that any root n consistent solution /3 = ,l3jj)'^ of (3.9) must satisfy /3// = 0, and /3/ 
has the following asymptotic normality: 

V^{Pi - /3/0 -{A- S)-i5} A iV{0, {A - J:)-^B{A - I])-^}. 
See the earlier version of this work, Ma and Li (2007), for details. 

4. Numerical studies and application 

In this section, we provide implementation details such as tuning parameter selection 
and model error approximation. Issues related to the numerical procedure to solve (2.2) 
and (3.9); the choice of kernel and bandwidth in the semi-parametric model, and the 
treatment of multiple roots have been addressed in Ma and Carroll (2006) and are not 
further discussed here. We assess the finite sample performance of the proposed procedure 
by Monte Carlo simulation and illustrate the proposed methodology by an empirical 
analysis of the Framingham heart study data. In our simulation, we concentrate on the 
performance of the proposed procedure for a quadratic logistic measurement error model 
and a partially linear logistic measurement error model in terms of model complexity 
and model error. 

4.1. Tuning parameter selection 

An MM algorithm (Hunter and Li (2005)) and a local linear approximation (LLA) algo- 
rithm (Zou and Li (2008)) have been proposed for penalized likelihood with non-concave 
penalty. However, both the minorize-maximize (MM) algorithm and the LLA algorithm 
are difficult to implement for the measurement error models we consider. Thus, we em- 
ploy the local quadratic approximation (LQA) algorithm (Fan and Li (2001)) to solve 
the penalized estimating equations. Specifically, in implementing the Newton-Raphson 
algorithm to solve the penalized estimating equations, we locally approximate the first- 
order derivative of the penalty function by a linear function, following the idea of the 
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LQA. Specifically, suppose that at the fcth step of the iteration, we obtain the value j3^^\ 



Then, for p'"^'^ not very close to zero. 



PA(/3.)=Pl(l/3j-|)sign(/3,)^ 



Otherwise, we set (3 - = 0, and exclude the corresponding covariatc from the model. 
This approximation is updated in every step of the Newton-Raphson algorithm iteration. 
In practice, we set the initial value of /3 to be the unpcnalizcd estimating equation 
estimate. It can be shown that when the algorithm converges, the solution will satisfy the 
penalized estimating equations. Following Theorems 2 and 4, we can further approximate 
the estimation variance of the resulting estimator. That is 

cot(/3) = -{E - ^x)-'F{E - Ea)~^, 
n 

where Sa is a diagonal matrix with elements equal to PAd/SjD/l/Sj l for non- vanishing /3j, 
a linear approximation of E defined in (2.5). We use E to denote the sample approxi- 
mation of EdS*ff j{W,Z,Y,l3i)/d(3j evaluated at $ for the parametric model (2.1) and 

the sample approximation of the matrix A evaluated at /? for the semi-parametric model 
(3.8). Similarly, we use F to denote the sample approximation of cov(S'*jy j) evaluated 

at /? for the parametric model and the sample approximation of the matrix B evalu- 
ated at /3 for the semi-parametric model, respectively. The consistency of the proposed 
sandwich formula can be shown by using similar techniques as in Fan and Peng (2004). 
The accuracy of this sandwich formula will be tested in our simulation studies. 

It is desirable to have automatic, data-driven methods to select the tuning parameter 
A. Here we will consider two tuning parameter selectors, the GCV and BIG. To define 
the GCV and BIG statistics, we need to define the degrees of freedom and goodness-of- 
fit measure for the final selected model. Similar to the nonconcave penalized likelihood 
approach, we may define the effective number of parameters or degrees of freedom to be 

d/A=trace{/(/ + SA)-i}, 

where / stands for the Fisher information matrix. For the logistic regression models 
employed in this section, a natural approximation of /, ignoring the measurement error 
effect, is V'^QV, where V represents the covariates included in the model and Q is a 
diagonal matrix with the ith element equal to fl\i(l — l^x,i)- Here, /tA.i = P{Xi = l|^i)- 

In the logistic regression model context of this section, we may employ its deviance as 
a goodness-of-fit measure. Specifically, let be the conditional expectation of Yi given 
its covariates for i = 1, . . . , n. The deviance of a model fit jl\ = (/iA,i, ■ • ■ , fJ'\,n)'^ is defined 
to be 

n 

D{M = 2^[raog(y,/AA,0 + (1 - Yi)\og{{i - r,)/(i - Aa,.)}]- 
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Define the GCV statistic to be 



GCViX) = 

"(1 - dfx/'ny 



and the BIG statistic to be 



BIC{X)^D{fix) + 2\og{7i)df^. 

The GCV and the BIG tuning parameter selectors select A by minimizing GCV{X) and 
BIC{X), respectively. Note that the BIG tuning parameter selector is distinguished from 
the traditional BIG variable selection criterion, which is not well defined for estimating 
equation methods. Wang, Li and Tsai (2007) provided a study on the asymptotic behavior 
for the GGV and BIG tuning parameter selectors for the non-concave penalized least- 
squares variable selection procedures in linear and partially linear regression models. 
Further study of the asymptotic property of the proposed tuning parameter selection is 
needed, but it is outside the scope of this paper. 



4.2. Model error 

Model error is an effective way of evaluating model adequacy versus model complexity. 
To implement the concept of model error in evaluating our procedure, we first simplify its 
definition for the logistic partially linear measurement error model. Denote //(/S, X, Z) = 
E{Y\S, X, Z), and define the model error for a model fl{S, X, Z) as 

Mi?(A) - E{^{S+,X+,Z+) - fi(S+,X+, Z+)}\ 

where the expectation is taken over the new observation S*"*", X"*" and . Let g{-) 
be the logit link. For the logistic partially linear model, the mean function has the 
form ^i{S,X,Z)=g-^{B{Z) + j3'^V}, where F = (S^, XT)T. if and /3 are consistent 
estimators for 9{-) and /3, respectively, then by a Taylor expansion the model error can 
be approximated by 

ME{fL) « E{g-^{e{z+) + p'^v+Y[{e{z+) - e{z+)Y 

+ CP'^V+ - p'^V+f + 2{§{Z+) - 0{Z+)}0^V+ - P^V+)]). 

The first component is the inherent model error due to 0{-), the second one is due to the 
lack of fit of /3, and the third one is the cross-product between the first two components. 
Thus, to assess the performance of the proposed variable selection procedure, we define 
the approximate model error (AME) for f3 to be 



AME0) = E[g-^{9{Z+) + l3^V+y0^V+ - l3^V+)% 
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Furthermore, the AME of /? can be written as 

AME0) = 0- (3fE[g''^{d{Z+) + /3^F+}V+y+'^](/3 - /3) 

(4-1) 

In our simulation, the matrix Cx is estimated by 1 miUion Monte Carlo simulations. For 
measurement error data, we observe W instead of X. We also consider an alternative 
approximate model error 

AMEw0) = 0~l3fCw{$-l3), (4.2) 

where Cw is obtained by replacing X with W in the definition of Cx. The AME{f3) 
and AMEwiP) are defined for the parametric model case by setting 9{ ) — 0. Note that 
although we defined the model error in the context with a logistic link function, it is 
certainly not restricted to such a case. The general approach for calculating AME is 
to approximate the probability density function evaluated at the estimated parameters 
around the true parameter value and to extract the linear term of the parameter of 
interest. AMEw is calculated by replacing X with W. 

4.3. Simulation examples 

To demonstrate the performance of our method in both parametric and semi-parametric 
measurement error models, we conduct two simulation studies. In our simulation, we 
will examine only the performance of the penalized estimating equation method with the 
SCAD penalty. 

Example 1. In this example, we generate data from a logistic model where the covariate 
measured with error enters the model through a quadratic function and the covariates 
measured without error enter linearly. The measurement error follows a normal additive 
pattern. Specifically, 

logit{p(y = Z)} = /3o + PiX + P^X^ + + PiZ2 + /35^3 + P&Zi 
+ fijZz+P^Zfi+(3^Z^ 

and 

w = x + u, 

where /3 — (0, 1.5, 2, 0, 3, 0, 1.5, 0, 0, 0)"^, the covariate X is generated from a normal distri- 
bution with mean and variance 1, (Zi, . . . , Zq)^ is generated from a normal distribution 
with mean 0, and covariance between Zi and Zj is 0.5''^-''. The last component of the 
Z covariates, Z7, is a binary variable taking value or 1 with equal probability. U is 
normally distributed with mean and standard deviation 0.1. In our simulation, the 
sample size is taken to be either n = 500 or n = 1000. 
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n RAME RAMEw # of zero coefficients 



median (MAD) median (MAD) C E 



GOV 


500 


0.694 (0.231) 


0.698 (0.228) 


4.574 


0.006 


BIG 


500 


0.396 (0.188) 


0.396 (0.187) 


5.857 


0.074 


GGV 


1000 


0.766 (0.187) 


0.770 (0.185) 


4.456 





BIG 


1000 


0.390 (0.157) 


0.401 (0.158) 


5.758 


0.010 



For the selected model, the model complexity is summarized in terms of the number 
of zero coefficients and the model error is summarized in terms of relative approximation 
model error (RAME) , defined to be the ratio of model error of the selected model to that 
of the full model. In Table 1, the RAME column corresponds to the sample median and 
median absolute deviation (MAD) divided by a factor of 0.6745 of the RAME values over 
1000 simulations. Similarly, the RAMEvi/ column corresponds to those of the RAMEvy 
values over 1000 simulations. From Table 1, it can be seen that the values of RAME and 
RAMEvF arc very close. The average count of zero coefficients is also reported in Table 1, 
where the column labeled "C" presents the average count restricted only to the true zero 
coefficients, while the column labeled "E" displays the average count of the coefficients 
erroneously set to 0. 

We next verify the consistency of the estimators and test the accuracy of the proposed 
standard error formula. Table 2 displays the bias and sample standard deviation (SD) 
of the estimates for two non-zero coefficients, (/3i,/32), over 1000 simulations and the 
sample average and the sample standard deviations of the 1000 standard errors obtained 
by using the sandwich formula. The row labeled "EE" corresponds to the unpenalizcd 
estimating equation estimator. We omit here the results for other non-zero coefficients 
and the results under sample size n = 500. Interested readers can find them in an earlier 
version of this work. Ma and Li (2007). Overall, the estimators are consistent and the 
sandwich formula works well. 



Table 2. 


Bias and standard 


errors, for example, 1 (n 


= 1000) 














Bias (SD) 


SE (Std(SE)) 


Bias (SD) 


SE (Std(SE)) 


EE 


0.072 (0.273) 


0.268 (0.062) 


0.124 (0.332) 


0.321 (0.088) 


GGV 


0.029 (0.254) 


0.250 (0.048) 


0.009 (0.258) 


0.253 (0.057) 


BIG 


0.024 (0.290) 


0.249 (0.054) 


0.052 (0.255) 


0.244 (0.052) 
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Table 3. MRMEs and model complexity, for example, 2 



Method 


n 


RMEx 


RMEw 


# of zero coefficients 


median (MAD) 


median (MAD) 


G E 


GOV 


500 


0.878 (0.161) 


0.880 (0.158) 


4.060 


BIG 


500 


0.381 (0.158) 


0.387 (0.155) 


5.713 


GGV 


1000 


0.868 (0.164) 


0.873 (0.160) 


4.061 


BIG 


1000 


0.386 (0.162) 


0.392 (0.161) 


5.694 



Example 2. In this example, we illustrate the performance of the method for a semi- 
parametric measurement error model. Simulation data are generated from 

logit(r) - PiX + P2S1 + • • • + /?io5'9 + B{Z), 
W^X + U, 

where /3, X, and W are the same as in the previous simulation. Wc generate S"s in a 
fashion similar to the Z's in Example 1. That is, {Si, . . . ^ Sg) is generated from a nor- 
mal distribution with mean zero and covariance between Si and Sj is 0.5''"-''. Sg is a 
binary variable with equal probability to be zero or one. The random variable Z is gen- 
erated from a uniform distribution in [— 7T/2,7t/2]. The true function 9{Z) = 0.5cos(Z). 
The parameter takes values f3 = (1.5, 2, 0, 0, 3, 0, 1.5, 0, 0, 0)"^. 

The simulation results arc summarized in Table 3, with notation similar to that of 
Table 1. From Table 3, we can see that the penalized estimating equation estimators 
can significantly reduce model complexity. Overall, the BIG tuning parameter selectors 
perform better, while GOV is too conservative. We have further tested the consistency 
and the accuracy of the standard error formula derived from the sandwich formula. 
The result is summarized in Table 4, with notation similar to that of Table 2. We note 
the consistency of the estimator and that the standard error formula performs very well. 
More simulation results arc summarized in the earlier version of the work, Ma and Li 
(2007). 



Table 4. Bias and standard errors, for example, 2 (n = 1000) 



Method 


/3i 




P2 




Bias (SD) 


SE (Std(SE)) 


Bias (SD) 


SE (Std(SE)) 


EE 


0.039 (0.170) 


0.166 (0.018) 


0.057 (0.194) 


0.190 (0.018) 


GGV 


0.047 (0.174) 


0.172 (0.020) 


0.069 (0.196) 


0.191 (0.021) 


BIG 


0.031 (0.169) 


0.170 (0.019) 


0.044 (0.179) 


0.185 (0.019) 
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4.4. An application 

The Framingham heart study data set (Kannel et al. (1986)) is a weU-known data set 
where it is generaUy accepted that measurement error exists on the long-term systolic 
blood pressure (SBP). In addition to SBP, other measurements include age, smoking 
status, and serum cholesterol. In the literature, there has been speculation that a second- 
order term involving age might be needed to analyze the dependence of heart disease 
occurrence. In addition, it is unclear if the interaction between the various covariates 
plays a role in influencing the heart disease rate. The data set includes 1615 observations. 

With the method developed here, it is possible to perform a variable selection to 
address these issues. Following the literature, we adopt the measurement error model of 
log(MSBP — 50) = log(SBP — 50) + C/, where C/ is a mean zero normal random variable 
with variance cr^ = 0.0126 and MSBP is the measured SBP. We denote the standardized 
log(MSBP — 50) as W . The standardization using the same parameters on log(SBP — 
50) is denoted X. The standardized serum cholesterol and age are denoted by Z\^Z2, 
respectively, and Z3 denotes the binary variable smoking status. Using Y to denote the 
occurrence of heart disease, the saturated model that includes all the interaction terms 
and also the square of age term is of the form 

l0git{p(y = Z's)} = + + + /34^^3 + /35 + /56^1 + /37^2 

+ /^8^3 + /^9^2 + /3l0^1^2 + P11Z1Z3 + (512Z2Z3, 

w = x + u. 

We used both GCV and BIG tuning parameter selectors to choose A. We present 
the tuning parameters and the corresponding GCV and BIG scores in Figure 1. 
The final chosen A is 0.073 and 0.172 by the GGV and BIG selectors, respectively. 
The selected model is depicted in Table 5. The GGV criterion selects the covariates 
X, XZi, 1, Zi, Z2, Z3, Z2, Z2ZS into the model, while the BIG criterion selects the covari- 
ates X,l, Zi, Z2 into the model. We report the selection and estimation results in Table 
5, as well as the semi-parametric estimation results without variable selection. 

As shown, the terms X,l, Zi, Z2 arc selected by both criteria, while Z3, Z|, and some 
of the interaction terms are selected only by GGV. The BIG criterion is very aggressive 
and it results in a very simple final model while the GGV criterion is much more con- 
servative, hence the resulting model is more complex. This agrees with the simulation 
results obtained. Since both criteria have included the covariate X, the measurement 
error feature and its treatment in the Framingham data is unavoidable. 

5. Discussion 

In this paper, we have proposed a new class of variable selection procedures in the frame- 
work of measurement error models. The procedure is proposed in a completely general 
functional measurement error model setting and is suitable for both parametric and semi- 
parametric models that contain unspecified smooth functions of an observable covariate. 
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Plots of GCV and BIG scores versus X 




0.05 0.1 0.15 0.2 0.25 



Figure 1. Tuning parameters and their corresponding BIC and GCV scores for the Framingham 
data. The scores are normalized to the range [0, 1] . 

We have assumed the error model p\Y\x.z{W\Xi Z) to be completely known for ease 
of presentation. When the error model contains an unknown parameter ^, the identi- 



Table 5. Results for the Framingham data set 





EE 


GCV 


BIC 


/3 (SE) 


/3 (SE) 


/3 (SE) 


X 


0.643 (0.248) 


0.416 (0.093) 


0.179 (0.039) 




-0.167 (0.097) 


-0.072 (0.041) 


(NA) 


XZ2 


-0.059 (0.111) 


(NA) 


(NA) 


XZs 


-0.214 (0.249) 


(NA) 


(NA) 


Intercept 


-3.415 (0.428) 


-3.255 (0.356) 


-2.555 (0.092) 


Zi 


0.516 (0.212) 


0.332 (0.085) 


0.124 (0.033) 


Z2 


1.048 (0.341) 


1.044 (0.329) 


0.398 (0.067) 


Z3 


1.060 (0.443) 


0.907 (0.373) 


(NA) 


zl 


-0.253 (0.125) 


-0.262 (0.121) 


(NA) 


Z1Z2 


-0.072 (0.103) 


(NA) 


(NA) 


Z1Z3 


-0.161 (0.225) 


(NA) 


(NA) 


Z2Z3 


-0.442 (0.336) 


-0.473 (0.326) 


(NA) 
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fiability of the problem requires additional information such as multiple measurements 
or instruments. Such information should be incorporated to estimate ^. Specifically, in 
the variable selection context, we can simply append the estimating equation with these 
additional estimating equations obtained from the corresponding score functions with 
respect to ^ and append the penalty function p'^ with zeros. Because the augmented 
estimating equations still have the same convergence property as illustrated in Ma and 
Carroll (2006), the same asymptotic convergence rates and oracle properties hold as in 
the known ^ case, without any efficiency loss. When the error model Pw\x,z{W\X, Z) is 
completely unspecified, a nonparametric estimation of the measurement error distribu- 
tion has to be carried out first, then the result can be plugged into the proposed variable 
selection and estimation procedure. In this case, the asymptotic convergence rate of the 
parameters and the oracle property remain the same, but the asymptotic variance will 
increase. The details of incorporating the estimation of unknown error and demonstrating 
its subsequent convergence property in the estimation framework are the focus of Hall 
and Ma (2007). 

Wc also would like to point out that in the special case of generalized linear models 
and normal additive error with possible hctcrosccdasticity, the procedure of solving linear 
integral equations can be spared and the estimating equations are simplified significantly 
(Ma and Tsiatis (2006)). In such situations, the computation complexity of the proposed 
procedure will be reduced to about the same level as for variable selection in regressions 
without errors in the variables. 

As pointed out by the referee, it is interesting to perform variable selection for high- 
dimensional data. In this paper, we allow the number of covariates to grow to infinity 
at a Op(n"^/^) rate as the sample size n increases. However, the proposed procedures 
and the used algorithm in this paper may not be directly applied to the large p, small 
n problems. Variable selection for the large p, small n setting is a very active research 
topic. It is challenging to extend the existing variable selection procedures for large p, 
small n problems to measurement error data. Further research is needed on this topic, 
but this is outside the scope of this paper. 

Appendix 

Global assumption (PI) on the penalty function: 

(PI) Let c„ = max{|p^(|/3jo|)| :/3jO ^ 0}. Assume that A„ ^0, a„ = 0(n~i/2) and 
c„ — > n — > oo. In addition, there exist constants C and D such that when 
71 , 72 > C A , (71 ) - (72 ) I < £> 1 71 - 72 1 • 

It is easy to verify that both the Li and the SCAD penalties satisfy this condition. 

The regularity conditions for Theorems 1 and 2: 

(Al) The expectation of the first derivative of S*j^ with respect to /3 exists at /3o and 
its left eigenvalues are bounded away from zero and infinity uniformly for all n. 
For any entry S^k in dSlff{Po)/dp'^ , E{S%) < Ci < oo. 
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(A2) The eigenvalues of the matrix E{Slg ^S*^ j) satisfy < C2 < Amin < • • • < 
Amax < C3 < 00 for aU n. For any entries, 5"^, 5*^- in S^gif^o), E{SlSj) < C4 < 00. 

(A3) The second derivatives of S^^ with respect to /3 exist and the entries are uni- 
formly bounded by a function M{Wi, Zi,Yi) in a large enough neighborhood of 
/3o. In addition, E{M'^) < C5 < 00 for aU n, d. 

Conditions (A1)~(A3) are mild regularity conditions. They guarantee that the solution 
of the following estimating equation 

n 

i=l 

is root (n/dn) convergent, and possesses asymptotic normality. 
Proof of Theorem 1. Condition (Al) allows us to define 

^={^{j^ J} ' ^eff^-^^eff and g;j/3) = MJ/3). 

n 

E ^*ff4l3) - - (Al) 

has a solution /3 that satisfies — /3o|| = Op(-v/3^a„). This will be shown using the 
Brouwcr fixed point theorem. Using the Taylor expansion, we have 



Let a„ — n + a„ and 0*^ j(/3) — (j)lff{Wi, Zi,Yt, P). It suffices to show that 



n 



i=l 



i=l 



g^T (/3-/3o){l+Op(l)}, 

where (3* is between /3 and /3o- It can be shown by conditions (Al)-(A3) and definition 
of </>*jy(-) that 



^ 1=1 



(/3-/3o) = |l/3-/3oll'{l + op(l)}. 
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We next check the key condition for the Brouwer fixed point theorem. For any /3 such 
that 11/3 — /3o|l = C\fd^(Xn for some constant C, it follows by condition (PI) that 

(/3 - /3o)^ I ^ <\>lg, (/3) - nV^^^^ (/3) | 

= (/3 - /5o)^| ^ E '^^if .*(/5o) - ^^A. (/^o) I + - /3o|P{l + op(l)}. 

Using the Cauchy-Schwarz inequality, it can be shown that the first term in the above 
equation is of order — /3o||0p(\/ d„ + d„na^) = Op(Cn^/^d„Q;^j). Note that v^ll/? ~ 
/3o|P = C^n-'^/^(i„a^. Thus the second term in the above equation dominates the first 
term with probability 1 — e for any e > as long as C is large enough. Thus, for any e > 
and for large enough C, the probability for the above display to be larger than zero is at 
least 1 — e. From the Brouwer fixed point theorem, we know that with a probability of at 
least 1 — e, there exists at least one solution for (Al) in the region ||/3 — /3o|| < C\fd^OLn- n 



Lemma on sparsity for Theorem 2. 



Lemma 1. // the conditions in Theorem 2 hold, then for any given /3 that satisfies 
11/3^ All = Op{y/dn/n) , with probability tending to 1, any solution {(3j , (3jj)"^ of (2.2) 
satisfies that fin = . 

Proof. Denote the fcth element in ^"^^ 5*^ (Wi, Zi,yi,/3) as L„fe(/?), k — di + \, . . . ,dn- 
We next show that the order of i„fc(/3) is Op{\/ndn), 



(A2) 

+2-^EE^^(A-Ao)(^.-M, 

where (3* is between /? and /3o- Because of condition (A2), the first term of (A2) is of 
order Op(n^/^) = Op{y/nd^). The second term in (A2) can be further written as 

2Jy—dp— ^^p — ]^^^ - P^^) + 2^^^^p — ^ti^ - ^^o)- (A3) 

Using the Cauchy-Schwarz inequality and condition (Al), it can be shown by straight- 
forward calculation that the first term in (A3) is of order Op(-\/c?„/n) = Op{^/ndn). Using 
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the Cauchy-Schwarz inequality again, the second term in (A3) is controUed by 



0=1 



eff,k 



1/2 



W~Po\\<nXn 



E- 



eff,k 



11/3 - All =Op(V^)- 



Thus, the second term of (A2) is of order Op{^/ndn). As for the third term of (A2), we can 
have a similar decomposition to that of (A3). Using the Cauchy-Schwarz inequality in 
matrix form and condition (A3), it can be shown that the third term of (A2) is of order 
Opid-n) + Op("-~^^^'^n) ~ OpiVndn) as d^/n ^ 0. Thus, LnkW) is of order Op{\/nd^). 
Hence we have 



LnkW) - np'xSPk) = -./^{^^nAS\Pk\)sigTv{Pk) + Op(l)}. 

Using condition (2.6), the sign of L„fe(/3) — np'^^{j3k) is decided by sign(/3fe) completely 
when n is large enough. From the continuity of L„fc(/3) — np'^ (Pk), wc obtain that it is 
zero at /3fe = 0. □ 

Proof of Theorem 2. From Theorem 1 and condition (PI), there is a root (n/d„) 
consistent estimator /3. From Lemma 1, /3 = {$J ,0'^)'^, so (i) is shown. Denote the first 
di equations in J2^=i Si^ffiWi, Zi,Yi, (/3j, O"'")'^} as LniPi)- Now consider solving the first 
di equations in (2.2) for /3/, while /?// — 0. Wc have 



0-L„(/3,)-np' ,,(/?,) 



i„(/3. 



dLn{P*io) 



CPi - Pm) - nhn - np'' (/3|)(/3z - /3,o), 



where /3| is between /3/o and /?/. It follows by condition (PI) that 



-i ^^«(/?lo) „ dLjfiml I, ,n . 



<2 



n — nj- 



dp] 



Op{n ^dn). 



Furthermore, for any fixed e > 0, it follows by conditions (Al) and (A3) and the Chcby- 
shev inequality that 



Pr 



n — —7? -tj 



< 



dPj 

dLniP*io) 

dPj 



nE 



dPj 

dPj 



> edri 



0(d„^n-^dfn) ==0(1), 
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00^ 



Therefore, 



i<9L„(^|o) „ 9L„(/3/o) „ , , 

^ — — Pa„j(p )-E t — +?'A„j(P/o. 



a/37 

and subsequently. 



a/37 ^^-^'^ ' 9/37 



We thus obtain that 



-E ^^fpj'^ + S„|(/3/ - /3/o) + 6„ = 7i-iL„(/3,o) + Op(n-i/2)_ 
Denote I* = E{Sl ,^g j{l3iQ)S*^^ff . Using condition (A2), it foUows that 



jl/2^Tj*-l/2 



ai^n(/3 
a/37 



+ S„|(/3/-/3/o)+6r, 

Let Y, = n"i/2u'^/*"i/25'„,eir,7(Wi, Z„y„/3/o). It foUows that for any e > 0, 

n 

Y^Em\H{m\>e)^nEm\H{m\>e)<n{E\\Y,t^^^^ 



Using Chebyshev's inequahty, we have 

Pr{m\>e)< 



Note that£;(||Fi||4) =n-2i;||wT/H.-i/25.^^^^(p^^^^^^y^^^^^)||4 jv^Q^g that the rank ofi;?;"^ 
is one, and hence X^axivv'^) equals the trace of vv'^ . So, Xmaxivv"^) = 1 as v'^v = 1. Thus, 
it follows that 

E{\\Y,\\^)=n-^E{Seffj{Wi,Z,,Y,,(3jofr-'^Vr'^/^Seff,i{Wi,Zi,Yi,Pio)y 
<n-^Xl^^{r~')E{Seff,i{Wi,Zi,Yi,M'^S,ffjiWi,Z,,Y,,Pio)y 
= n-2AL.(r-i)i?|j5ejy./(W^i,Zi, Yi,/3,o)r = 0(d?n-2), 
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due to condition (A2). Hence, 

n 

Y,E\\Y,f\{\\Y,\\ >e) = 0(ndin-i,i-i/2) =0(1). 

i=l 

On the other hand, 

n 

Y,cov{Y,) = ncov{n-^/^vI*-'/^SeffjiWi,Zi,Yi,l3jo)} 

i=l 

= vI*-^/^E{Seff.i{Wi,Zi,Yi,l3io)Seffj{WuZi,Yi,l3iof}r-'^h,^ = l. 
Following the Lindeberg-Feller central limit theorem, the results in (ii) now follow. □ 
Regularity conditions for Theorems 3 and 4. 

The notation C; below is generic and is allowed to be different from that in condi- 
tions (A1)-(A3). 

(Bl) The first derivatives of C with respect to /3 and 6 exist and are denoted a.s Cp and 
Ce , respectively. The first derivative of 9 with respect to (3 exists and is denoted 
as 6 13. Thus E{Ci3 + CeOp) exists and its left eigenvalues are bounded away from 
zero and infinity uniformly for all n at /3o and the true function 0q(Z). For any 
entry Sjk of the matrix d(£^ + Ce9fi)/d(i, E{S^^) <Ci<oo. 

(B2) The eigenvalues of the matrix E{Ci -Ui{Z)'ii}{Ci -Ui{Z)'ii}^ satisfy < C2 < 
Amin < • • • < Ainax < C3 < oo for all n; for any entries Sk,Sj in {Cp + CgOp), 
E{SlS^) < C4 < 00. 

(B3) The second derivatives of £ with respect to (3 and 6 exist, the second derivatives 
of 9 with respect to (3 exist, and the entries are uniformly bounded by a function 
M{Wi,Zi,Si,Yi) in a neighborhood of (3o,9o- In addition, E{AP) < C5 < oo for 
all n, d. 

(B4) The random variable Z has compact support and its density fz{z) is positive 
on that support. The bandwidth h satisfies nh^ — >■ and nh? — >■ 00. 0{z) has a 
bounded second derivative. 

Proof of Theorem 3. Denote 

J^[E{[Cp+Ce9p)\p„^B,}]-\ c^lff{P,e) = JC{P,9) and 

Let a„ =71.-1/2 + a„ (j)^^ ^^{13,9) ^ (i)lff{W^,Z,,S^,Y,, (3,e{l3)} . It wih be shown that 

n 

^'"^ E ^*eff,{P, 0) - m - (A4) 

i=l 
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1/2 

has a solution /3 that satisfies ||/3 — /3o|| = Op{dn an)- 

Due to the usual local estimating equation expansion, we have 

n 



(A5) 



+ 0p(n-i/2), 



which implies that 0{z, /Sq) -Oq{z) = Op{h'^ +n-^/'^h-^/'^). For any ^ such that ||;5-^oll 
C\fd^an for some constant C, we obtain the expansion 



n 

= n-i/2^</>:if,.{/3o,^(/3o)} 

n 



1 " 



(/3-/3o), 



/3* 



where /3* is between /3 and /3o- Because of condition (B3), each component of the last term 
is uniformly of order Op(n^/^||/3 — /?o|P)- The second term can be written as n^^^{l + 
0p(l)}(/3 — /3o) under conditions (Bl). (B3) and (B4). The first term can be further 
expanded as 



-1/2 



i=l 
i=l 

under conditions (B3) and (B4). Summarizing the above results, making use of (A5), we 
obtain 
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1=1 

-'^ '^2^^ dffT j-^^ +0,(1) 

n n 

= ^ 0;^^^(/3o,^o) + n^'^P - Po) - n'^'^ ^ JU{Z^^,{PM + Op(l) 

'i=l i=l 

under condition (B4). Similar to the situation in Theorem 1, under condition (PI), we 
further obtain 

(/? - /3o)^ I n- V2 |j ^) _ „i/2,^^^ (^) I 

= (/3 - /?o)^<^ ^ 0:^,,(/3o, 0o) - n^'\'x„ (/^o) - n'^/^ ^ JZ^(Z,)*,(/?o, ^o) (A6) 



1=1 



pi 

The first term in the above display is of order Op^Cn^^^dnoi^) and the second term equals 
C^n^/^(i„Q;^, which dominates the first term as long as C is large enough. The last term 
is dominated by the first two terms. Thus, for any e > 0, as long as C is large enough, the 
probability for the above display to be larger than zero is at least 1 — e. From Brouwer's 
fixed point theorem we know that with a probability of at least 1 — e, there exists at least 

1/2 

one solution for (A4) in the region — Pq'^ <Cdn an- □ 



Lemma for Theorem 4. 



Lemma 2. If conditions in Theorem 4 hold, then for any given /3 that satisfies \\f3 — /3q\\ = 
Op{yJd/n), with probability tending to 1, any solution (/3j,/3j)'^ of (2.2) satisfies that 
Pii^O. 

Proof. Denote the fcth equation in X]r=i ^i{Pi ^(Z^)} a-s LnkiP, 0) and that in X]r=i^(^i) ^ 
^i(/^Oi ^o) a-s GnkiPoi ^o)j fc = rfi + 1, • . . , rfn, then the expansion in Theorem 3 leads to 

LnkWJ)-np'^^{f3k) 

= infc(/3oj^o) ~ G„fe(/3o,6'o) 

d 

+ nJ2{J-^)kj{Pj~Pjo)-np'^J\Pk]) sign(/3fe ) + Op ( ^/^,) . 



Variable selection in measurement error models 



299 



Similar to the derivation in Lemma 1, the first three terms of the above display are all 
of order Op(-\/nd„), hence we have 

L„fe(/3, 6) - np\^ {(3k) = -V^{VWd^P'x„ sign(/3fe) + 0^(1)}. 

Because of (2.6), the sign of Lnk{P) — np'^^{(3k) is decided by sign(/3fc) completely. From 
the continuity of L„fc(/3) — np'^ (Pk), we obtain that it is zero at = with a probability 
larger than any 1 — e. □ 

Proof of Theorem 4. (i) immediately follows by Lemma 1. Denote the first di equa- 
tions in Er=iA{(/3j,0T)T,^} as i„{/3/, and that in J2tl'^^Wo,0o)l^iZ,) as 
GniPio, do). Note that the di x di upper left block of is the matrix A defined in The- 
orem 4. Using the Taylor expansion for the penalized estimating function at /3 = 0)"^, 
the first di equations yield 

Q^LXPiMi)}~np'^„j0i) 

= Ln{l3io, Oo) - G„(/3/o, Oo) + nA{$i ~ /3io)- nb^ 

- n{E„ + Op(l)}(/3/ - l3io) + Op(4/2ni/2) 

= L„(/3/o,^o) " G„(/3/o,eo) + S„)[/3/ - (A - + OpKV2„i/2), 

Using condition (B2), we have 

^i/2^T5-i/2|(_^ + E„)(/3/ - 13m) + bn} 

= n~^/^v^B-'/^L,,{l3jo,eo) - G„(/3/o, ^o)} + Op{v'^B-^/^) 

= n~^/^v^B-'/'-{Ln{l3jo,9o) - G„(/?/o, ^o)} + 0^(1). 

Let Y, = n-^/^v^ B-^/^CnnWio,Oo) - llniiZ^)-9^il3io,eo)},i ^ 1, . . . ,n. It follows that 
for any 6 > 0, 

71 

J2Em\\Hm\\ >e)^ nE\\Y,f >e)< niE\\Y^\\y/^Pri\\Y^\\ > e)}'/'. 

1=1 

Using the Chebyshev inequality, we have Prdl^iH > e) = 0(n~^) and E'dlYijl'*) is 
bounded by 

n-^Xl^^{B-')E[{Cnn{Pio,0o)~Uni{Zi)^SJi{Pio,do)}^ 

X {C„niPio,eo)-Uai{Zi)^'i{l3io,eo)}]'\ 

which equals n-^Xl^,{B-^)E\\{C„nWio,do) - UaiiZi)^i{(3io,eo)}\\^ = Oidln-^) by 
condition (B2). Hence, 

n 

J2E\m'im\\ > e) = 0(nd„n"in-i/2) =0(1). 

i=l 
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On the other hand, 



n 



^cov(KO 



ncov 



[n-^/^v^B-^/'\Cnii{PKM - Z^„/(^i)*i(/3/o, ^^o)}] = 1- 



(ii) follows by the Lindeberg-Feller central limit theorem. 



□ 
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